Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 427: Amend wheel's filename escaping rules #1824

Closed

Conversation

uranusjr
Copy link
Contributor

This builds on the last solution proposed on Discourse. It delegates most of the normalization logic to existing specifications, and only perform simple dash-to-underscore substitution in the wheel's filename. This makes the unescaping logic the easiest, while producing an esacping logic that's easy to implement (since wheel builders should already know how to produce valid .dist-info names).

cc @dholth @jwodder

This builds on the last solution proposed on Discourse[1], delegating
most normalization logic to existing specifications, and only perform
simple dash-to-underscore substitution in the wheel's filename. This
makes the unescaping logic the easiest, while producing an esacping
logic that's easy to implement (since wheel builders should already know
how to produce valid .dist-info names).

[1]: https://discuss.python.org/t/5605
@jwodder
Copy link
Contributor

jwodder commented Feb 19, 2021

This change implies that wheel filename components other than name and version (which have specifications restricting their character sets) can now contain any non-hyphen character. Is this intended?

@uranusjr
Copy link
Contributor Author

uranusjr commented Feb 19, 2021

Values in comapatibility tag fields don’t need to be unescaped (just need to compare them directly to a known compatibility list, so any filesystem-valid cahracters would work. This leaves the build tag. Since it is intended to be used as a tie-breaker using lexical ordering, again, any filesystem-valid cahracters would work.

I guess we can add a rule to limit all those fields to \w+. That’s already what everyone is doing (except the I don’t know any real-world usage of the build tag). But I don’t see much benefit in it, to be honest; the worst thing can happen is someone builds a wheel that can’t be downloaded due to filename restrictions, which project maintainers likely would avoid on their own anyway.

Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dholth
Copy link
Contributor

dholth commented Feb 19, 2021 via email

@uranusjr
Copy link
Contributor Author

@dholth Do you know what values they're using for the build tag?


re.sub("[^\w\d.]+", "_", distribution, re.UNICODE)
The ``distribution`` and ``version`` components of the filename should
follow the same rules as the ``.dist-info`` directory. [2]_ Each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good practice for specifications to contain their own definitions with minimal references to other documents, even if that means some duplication. E.g. if something changed about dist-info directories, it may be unclear whether the change also affects wheel filenames. Admittedly this is somewhat pedantic!

@dholth
Copy link
Contributor

dholth commented Feb 19, 2021

@gvanrossum
Copy link
Member

I'm confused. This is a standards track PEP from 2012 that has status "final". It is my understanding that you can't just amend that in place based on one thread on Discourse. I don't recognize any of the names in that thread. Has the packaging WG said anything about this?

@pfmoore
Copy link
Member

pfmoore commented Feb 20, 2021

@gvanrossum Packaging specifications are handled somewhat differently, see here.

Having said that, I do think that there needs to be more discussion on Discourse before this becomes a PEP change, and my initial feeling is that I'd actually rather see this change as the prompt to move the reference spec for wheels from the PEP into https://packaging.python.org/specifications/, freezing PEP 427 as a historical document.

But I need more time to think through the implications here so this PR shouldn't be merged yet.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my other comment, this needs further discussion on the Discourse thread.

@takluyver
Copy link
Contributor

From what I understand, setuptools and pip don't implement escaping like the PEP currently says, and probably never have. A lot of packaging specifications are write-ups of existing de-facto standards defined by the main tools, and far more people look at what pip & setuptools do than what a PEP says. So even as the maintainer of a tool which does follow the written spec, I'm not particularly surprised or disappointed that it's the written spec that would change in a situation like this.

@pfmoore
Copy link
Member

pfmoore commented Feb 20, 2021

Let's take further discussion back to Discourse, here.

For information, historically PEP 427 was the canonical definition of how wheels worked, the tools were built from the spec (unlike, as @takluyver mentioned, many other packaging specs which are write-ups of existing practices). Unfortunately, though, we weren't as experienced writing specs when PEP 427 was written, so it's a lot more "informal" than later ones, and as such needs tidying up and pinning down in some areas (the fact that we've managed to not need more revisions than we have is both a credit to the original work, and a result of us pulling out specifics of more complex areas into their own PEPs).

The escaping part of the wheel spec was written before most of the other standards, and I suspect it was mostly a practical concession to make sure that splitting on dashes worked. With the exception of project name, more recent standards provide better normalisation rules that (luckily!) still allow splitting on dashes. IMO, we should therefore defer to other specs when we can for normalisation rules. See my discourse post for more details.

@gvanrossum
Copy link
Member

Okay, it's up to @pfmoore to merge this if and when it's to his liking. I'll unsubscribe.

@brettcannon
Copy link
Member

I'm closing this for now since an actual decision has not been reached yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants